Job Description:
This position is ideal for someone with extensive experience in production support, monitoring, and incident resolution, combined with strong collaboration skills and an analytical mindset. Experience with Splunk, IBM Instana, and SQL are key for success, while familiarity with cloud technologies like Google Cloud and AWS is a plus.
Duties and Responsibilities:
-
• Incident Management: Triage and evaluate the impact of production incidents, identify root causes, and resolve issues effectively.
• Dashboard Development: Design, develop, and deploy dashboard solutions and multi-dimensional reports using Splunk logs and IBM Instana to visualize and analyze system health and performance.
• Monitoring & Alerting: Work with the team to assess monitoring and alerting needs, with a primary focus on Splunk logs, to ensure early detection of potential issues.
• Data Analysis: Analyze databases and Splunk logs to determine the impact of incidents, ensuring minimal downtime and optimal system performance.
• Early Detection: Use Splunk logs and other monitoring tools to detect abnormalities early, preventing larger-scale issues.
• Cloud Analysis: Review Google Cloud Platform (BigQuery) for irregularities and implement strategies for improvements.
• Collaboration: Actively participate in production meetings to discuss issues and solutions with cross-functional teams.
Must-Haves:
• Experience with Splunk (5-7 years): Strong understanding and proficiency in using Splunk, particularly for complex queries to analyze logs and detect issues.
• Experience with APM Tools (IBM Instana or Similar) (5-7 years): Familiarity with monitoring tools like IBM Instana, New Relic, or DataDog for application performance monitoring and troubleshooting.
• SQL Expertise (5-7 years): Proficient in SQL for analyzing large datasets and deriving actionable insights.
• Analytical Skills: Strong skills in analyzing data using Excel and translating that data into actionable solutions.
• Collaboration Skills: Ability to work effectively within a team and communicate complex technical issues clearly with stakeholders.
Nice to Have:
• Google Cloud Platform (BigQuery) Experience: Experience with Google Cloud Platform, specifically BigQuery, to identify issues in cloud environments.
• Programming Knowledge: Familiarity with Java or other high-level programming languages to better understand code behavior and system design.
• Cloud Architecture Knowledge: Understanding of cloud architecture (preferably AWS) and how it impacts production environments.
• Enterprise Application Architecture: Familiarity with best practices in enterprise application architecture design and development to ensure scalable and efficient systems.